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5 ON THE FLY GENERATION OF MULTIMEDIA CODE FOR 

IMAGE PROCESSING 



Background of the Invention 
10 Field of the Invention 

The invention relates to the processing of multimedia data with processors that 
feature multimedia instruction enhanced instruction sets. More particularly, the 
invention relates to a method and apparatus for generating processor instruction 
15 sequences for image processing routines that use multimedia enhanced 
instructions. 

Description of the Prior Art 

In general, most programs that use image processing routines with multimedia 
20 instructions do not use a general-purpose compiler for these parts of the 
program. These programs typically use assembly routines to process such data. 
A resulting problem is that the assembly routines must be added to the code 
manually. This step requires high technical skill, is time demanding, and is prone 
to introduce errors into the code. 

25 
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5 In addition, different type of processors, (for example, Intel's Pentium I w/MMX 
and Pentium II, Pentium III, Willamette, AMD's K-6 arid AMD's K-7 aka. Athlon) 
each use different multimedia command sets. Examples of different multimedia 
command sets are MMX, SSE and 3DNow. Applications that use these 
multimedia command sets must have separate assembly routines that are 
10 specifically written for each processor type. 

At runtime, the applications select the proper assembly routines based on the 
processor detected. To reduce the workload and increase the robustness of the 
code, these assembly routines are sometimes generated by a routine specific 
15 source code generator during program development. 

One problem with this type of programming is that the applications must have 
redundant assembly routines which can process the same multimedia data, but 
which are written for the different types of processors. However, only one 

20 assembly routine is actually used at runtime. Because there are many 
generations of processors in existence, the size of applications that use 
multimedia instructions must grow to be compatible with all of these processors. 
In addition, as new processors are developed, all new routines must be coded for 
these applications so that they are compatible with the new processors. An 

25 application that is released prior to the release of a processor is incompatible 
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5 with the processor unless it is first patched/rebuilt with the new assembly 
routines. 

It would be desirable to provide programs that use multimedia instructions which 
are smaller in size. It would be desirable to provide an approach that adapts such 
10 programs to future processors more easily 

Summary of the Invention 

In accordance with the invention, a method and apparatus for generating 
assembly routines for multimedia instruction enhanced data is shown and 
15 described. 

An example of multimedia data that can be processed by multimedia instructions 
are the pixel blocks used in image processing. Most image processing routines 
operate on rectangular blocks of evenly sized data pieces (e.g. 16x16 pixel 

20 blocks of 8 bit video during MPEG motion compensation). The image processing 
code is described as a set of source blocks, destination blocks and data 
manipulations. Each block has a start address, a pitch (distance in bytes 
between two consecutive lines) and a data format. The full processing code 
includes width and height as additional parameters. All of these parameters can 

25 either be integer constants or arguments to the generated routine. All data 
operations are described on SIMD data types. A SIMD data type is a basic data 
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5 type (e.g. signed byte, signed word, or unsigned byte) and a number or repeats 
(e.g. 16 pixels for MPEG Macroblocks). The size of a block (source or 
destination) is always the size of its SIMD data type times its width in horizontal 
direction and the height in vertical direction. 

10 In the presently preferred embodiment of the invention, an abstract image 
generator inside the application program produces an abstract routine 
representation of the code that operates on the multimedia data using SIMD 
operations. A directed acyclic graph is a typical example of a generic version. A 
translator then generates processor specific assembly code from the abstract 

is respresentation. 

Brief Description of the Drawings 

FIG. 1 is a block diagram of a computer system that may be used to implement a 
20 method and apparatus embodying the invention for translating a multimedia 
routine from its abstract representation generated by an abstract routine 
generator inside the application's startup code into executable code using the 
code generator. 

25 
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5 Description of the Preferred Embodiment 

In Fig.1 the startup code 1 1 of the application program 13, further referred to as 
the abstract routine generator, generates an abstract representation 15 of the 
multimedia routine represented by a data flow graph. This graph is then 

10 translated by the code generator 17 into a machine specific sequence of 
instructions 19, typically including several SIMD multimedia instructions. The 
types of operations that can be present inside the data flow graph include add, 
sub, multiply, average, maximum, minimum, compare, and, or, xor, pack, unpack 
and merge operations. This list is not exhaustive as there are operations 

15 currently performed by MMX, SSE and 3DNow for example, which are not listed. 
If a specific command set does not support one of these operations, the CPU 
specific part of the code generator replaces it by a sequence of simpler 
instructions (e.g. the maximum instruction can be replaced by a pair of subtract 
and add instruction using saturation arithmetic). 

20 

The abstract routine generator generates an abstract representation of the code, 
commonly in the form of a directed acyclic graph during runtime. This allows the 
creation of multiple similar routines using a loop inside the image processing 
code 21 for linear arrays, or to generate routines on the fly depending on user 
25 interaction. E.g. the bi-directional MPEG 2 motion compensation can be 
implemented using a set of sixty-four different but very similar routines, that can 
be generated by a loop in the abstract image generator. Or an interactive paint 
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5 program can generate filters or pens in the form of abstract representations 
based on user input, and can use the routine generator to create efficient code 
sequences to perform the filtering or drawing operation. Examples of the data 
types processed by the code sequences include: SIMD input data, image input 
data and audio input data. 

10 

Examples of information provided by the graphs include the source blocks, the 
target blocks, the change in the block, color, stride, change in stride, display 
block, and spatial filtering. 

15 The accuracy of the operation inside the graphs can be tailored to meet the 
requirements of the program. The abstract routine generator can increase its 
precision by increasing the level of arithmetics per pixel. For example, 7-bit 
processing can be stepped up to 8-bit, or 8-bit to 16-bit. E.g. motion 
compensation routines with different types of rounding precision can be 

20 generated by the abstract routine generator. 

The abstract representation, in this case the graph 15, is then sent to the 
translator 17 where it is translated into optimized assembly code 19. The 
translator uses standard compiler techniques to translate the generic graph 
25 structure into a specific sequence of assembly instructions. As the description is 
very generic, there is no link to a specific processor architecture, and because it 
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5 is very simple it can be processed without requiring complex compiler 

techniques. This enables the translation to be executed during program startup 
without causing a significant delay. Also, the abstract generator and the 
translator do not have to be programmed in assembly. The CPU specific 
translator may reside in a dynamic link library and can therefore be replaced if 

10 the system processor is changed. This enables programs to use the multimedia 
instructions of a new processor, without the need to be changed. 



Tables A-C provide sample code that generates an abstract representation for a 
motion compensation code that can be translated to an executable code 
15 sequence using the invention. 



TABLE A 



#ifndef MPEG2MOTIONCOMPENSATION_H 
20 #define MPEG2MOTIONCOMPENSATION_H 

#include "driver\sof twarecinemaster\common\prelude . h" 
#include " . . \ . . \BlockVideoProcessor\BVPXMMXCodeConverter . h" 

25 // 

// Basic block motion compensation functions 
// 

class MPEG2MotionCompensation 
{ 

30 protected: 
// 

// Function prototype for a unidirectional motion compensation 

routine 

// 

35 typedef void { stdcall * CompensationCodeType ) (BYTE * sourcelBase, 

int sourceStride, 

BYTE * targetBase, short * deltaBase, 

int deltaStride, 

int num) ; 



7 



Attorney Docket No. RAVIO 



// 

// Function prototype for a bidirectional motion compensation 

routine 

// 

10 typedef void ( stdcall * BiCompensationCodeType ) (BYTE * 

sourcelBase, BYTE * source2Base, int sourceStride, 

BYTE * targetBase, short * deltaBase, 

int deltaStride, 

int num) ; 

15 

// 

// Motion compensation routines for unidirectional prediction. 
Each routine 

20 // handles one case. The indices are 

// - y-uv : if it is luma data the index is 0* otherwise 1 
//' - delta : error correction data is present (eg. the block 
is not skipped) 

// - halfy : half pel prediction is to be performed in 
25 vertical direction 

// - halfx : half pel prediction is to be performed in 
horizontal direction 
// 

CompensationCodeType compensation [2 ] [2 ] [2] [ 2] ; // 

30 y-uv delta halfy halfx 

BVPCodeBlock * compensat ionBlock [2 ] [2 ] [2 ] [ 2 ] ; 

// 

// Motion compensation routines for bidirectional prediction. 
35 Each routine 

// handles one case. The indices contain the same parameters as 

in the 

// unidirectional case, plus the half pel selectors for the 
second source 
40 // 

BiCompensationCodeType bicompensation [ 2 ] [2] [2] [2] [2] [2] ; // 

y-uv delta halfly halflx half2y half2x 

BVPCodeBlock * bicompensationBlock [ 2 ] [2] [2] [2] [2] [2] ; 

public: 
45 // 

// Perform a unidirectional compensation. 
// 

void Mot ionCompensat ion (BYTE * sourcep, int stride, BYTE * destp, 
short * deltap, int dstride, int num, bool uv, bool delta, int halfx, 
50 int halfy) 
{ 

compensation [uv] [delta] [halfy] [halfx] (sourcep, stride, destp, 
deltap, dstride, num) ; 
} 

55 

// 

// Perform bidirectional compensation 
// 

void BiMotionCompensation (BYTE * sourcelp, BYTE * source2p, int 
60 stride, BYTE * destp, short * deltap, int dstride, int num, bool uv, 
bool delta, int halflx, int halfly, int half2x, int half2y) 
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{ 



bicompensation [uv] [delta] [halfly] [halflx] [half2y] [half2x] {sourcelp, 
source2p, stride, destp, deltap, dstride, num) ; 
} 

MPEG2MotionCompensation (void) ; 
~MPEG2MotionCompensation (void) ; 



15 #endif 



TABLE B 

#include "MPEG2MotionCompensation . h" 

#include ".A. ABIockVideoProcessorXBVPXMMXCodeConverter . h" 



// 

// Create the dataflow to fetch a data element from a source block, 
25 // with or without half pel compensation in horizontal and/or 

// vertical direction. 
// 

BVPDataSourcelnstruction * BuildBlockMerge ( BVPSourceBlock * 
sourcelBlockA, 

30 BVPSourceBlock * sourcelBlockB, 

BVPSourceBlock * sourcelBlockC, 



BVPSourceBlock * sourcelBlockD, 
int halfx, int halfy) 



{ 

35 if (halfy) 



if (halfx) 
< 

// 

40 // Half pel prediction in h and v direction, the graph part 

looks like this 
// 

// . — (LOAD sourcelBlockA) 

// / 
45 // . - - ( A VG ) 

// / \ 

// / * — (LOAD sourcelBlockB) . 

// <— (AVG) 

// \ . — (LOAD sourcelBlockC) 

50. // \ / 

// (AVG) 
// \ 

// * — (LOAD sourcelBlockD) 

// 

55 return new BVPDataOperation 

( 
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5 BVPDO_AVG, 

new BVPDataOperation 
( 

BVPDO_AVG, 

new BVPDataLoad (sourcelBlockA) , 
10 new BVPDataLoad (sourcelBlockB) 

), 

new BVPDataOperation 
( 

BVPDO_AVG, 

15 new BVPDataLoad (sourcelBlockC) , 

new BVPDataLoad (sourcelBlockD) 
) 

) ; 

} 

20 else 
{ 

// 

// Half pel prediction in vertical direction 
// 

25 // .--(LOAD sourcelBlockA) 

// / 
// <— (AVG) 
// \ 

// x — (LOAD sourcelBlockC) 

30 // 

return new BVPDataOperation 
( 

BVPDO_AVG, 

new BVPDataLoad (sourcelBlockA) , 
35 new BVPDataLoad ( sourcelBlockC) 

) ; 

} 

} 

else 
40 { 

if (halfx) 
{ 

// 

// Half pel prediction in horizontal direction 
45 // 

// .--(LOAD sourcelBlockA) 

// / 
// <— (AVG) 
// \ 
50 // *--(LOAD sourcelBlockB) 

// 

return new BVPDataOperation 
( 

BVPDO_AVG, 

55 new BVPDataLoad ( sourcelBlockA) , 

new BVPDataLoad (sourcelBlockB) 
) ; 

} 

else 

60 { 

// 
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5 // Full pel prediction 

// 

// <--(LOAD sourcelBlockA) 
// 

return new BVPDataLoad ( sourcelBlockA) ; 
10 } 
} 

} 

MPEG2MotionCompensation : : MPEG2MotionCompensation (void) 
15 { 

int yuv, delta, halfy, halfx, halfly, halflx, half2y, half2x; 
BVPBlockProcessor * bvp; 
BVPCodeBlock * code; 

20 BVPArgument * sourcelBase; 

BVPArgument * source2Base; 

BVPArgument * sourceStride; 

BVPArgument * targetBase; 

BVPArgument * deltaBase; 

25 BVPArgument * deltaStride; 

BVPArgument * height; 

BVPSourceBlock * sourcelBlockA; 

BVPSourceBlock * sourcelBlockB; 

30 BVPSourceBlock * s.ourcelBlockC; 

BVPSourceBlock * sourcelBlockD; 

BVPSourceBlock * source2BlockA; 

BVPSourceBlock * source2BlockB; 

BVPSourceBlock * source2BlockC; 

35 BVPSourceBlock * source2BlockD; 

BVPSourceBlock * deltaBlock; 
BVPTargetBlock * targetBlock; 

40 BVPDataSourcelnstruction * postMC; 

BVPDataSourcelnstruction * postCorrect; 
BVPDataSourcelnstruction * deltaData; 

// 

45 // Build unidirectional motion compensation routines 

// 

for (yuv = 0; yuv<2; yuv++) 
{ 

for(delta=0; delta<2; delta++) 
50 { 

for(halfy=0; halfy<2; halfy++) 
{ 

for(halfx=0; halfx<2; halfx++) 
{ 

55 bvp = new BVPBlockProcessor () ; 

bvp->AddArgument (height = new BVPArgument ( false ) ) 

bvp->AddArgument (deltaStride = new BVPArgument ( false ) ) 

bvp->AddArgument (deltaBase = new BVPArgument (true )) ; 

60 bvp->AddArgument (targetBase = new BVPArgument (true )) ; 

bvp->AddArgument (sourceStride = new BVPArgument ( false) ) 
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5 bvp->AddArgument (sourcelBase = new BVPArgument ( true) ) ; 

// 

// Width is always sixteen pixels, so one vector of sixteen 
unsigned eight bit elements, 
10 // height may vary, therefore it is an argument 

// 

bvp->Set Dimension (1, height) ; 
// 

15 // Four potential source blocks, B is one pel to the right, 

C one down and D right and down 
// 

bvp->AddSourceBlock (sourcelBlockA = new 
BVPSourceBlock {sourcelBase, 
20 sourceStride, BVPDataFormat (BVPDT_U8 , 16), 0x10000)); 

bvp->AddSourceBlock (sourcelBlockB = new 
BVPSourceBlock {BVPPointer (sourcelBase, 1 + yuv) , 
sourceStride, BVPDataFormat {BVPDT_U8 , 16), 0x10000)); 

bvp->AddSourceBlock (sourcelBlockC = new 
25 BVPSourceBlock (BVPPointer (sourcelBase, sourceStride, 1, 0), 
sourceStride, BVPDataFormat (BVPDT_U8, 16) , 0x10000) ) ; 

bvp->AddSourceBlock (sourcelBlockD = new 
BVPSourceBlock (BVPPointer (sourcelBase, sourceStride, 1, 1 + yuv), 
sourceStride, BVPDataFormat (BVPDT_U8, 16) , 0x10000) ) ; 

30 

// 

// If we have error correction data, we need this source 
block as well 
// 

35 if (delta) 

bvp->AddSourceBlock (deltaBlock = new 
BVPSourceBlock (deltaBase, deltaStride, BVPDataFormat ( BVPDT_S1 6, 16) , 
0x10000)); 

40 // 

// The target block to write the data into 
// 

bvp->AddTargetBlock (targetBlock = new 
BVPTargetBlock (targetBase, sourceStride, BVPDataFormat (BVPDT_U8, 16) , 
45 0x10000}); 

// 

// Load a source block base on the half pel settings 
// 

50 bvp->AddInstruction (postMC = BuildBlockMerge (sourcelBlockA, 

sourcelBlockB, sourcelBlockC, sourcelBlockD, halfx, halfy) ) ; 

if (delta) 
{ 

55 deltaData = new BVPDataLoad (deltaBlock) ; 

if (yuv) ■ 
{ 

// 

60 // It is chroma data and we have error correction data. 

The u and v 
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5 // parts have to be interleaved, therefore we need the 

merge instruction 

// 

// .--(CONV S16) <--postMC 

// / 
10 // <--(CONV U8)<— (ADD) 

// \ ' . — (SPLIT H)<-. 

// \ / \ 

// s -- (MERGE OE) 

>-- (LOAD delta) 

15 // \ / 

// (SPLIT T) <-¥ 

// 

bvp->AddInst ruction 
( 

20 postCorrect = 

new BVPDataConvert 
( 

BVPDTJJ8, 

new BVPDataOperation 
25 ( 

BVPDO_ADD, 
new BVPDataConvert 
( 

BVPDT_S16, 
30 postMC 

) , 

new BVPDataMerge 
( 

BVPDM_ODDEVEN, 
35 new BVPDataSplit 

( 

BVPDS_HEAD, 
deltaData 
) , 

40 new BVPDataSplit 

( 

BVPDSJTAIL, 

deltaData 

) 

45 ■ ) 

) 

) 

) ; 

} 

50 else 

{ 

// 

// It is luma data with error correction 
// 

55 // .--(CONV S16) <--postMC 

// ' / 

// <--(CONV U8)<— (ADD) 

// \ 

// ^--(LOAD delta) 

60 // 

bvp->Add!nst ruction 
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# 



5 ( 

postCorrect = 
new BVPDataConvert 
( 

BVPDT_U8, 

10 new BVPDataOperation 

( 

BVPDO__ADD, 
new BVPDataConvert 
( 

15 BVPDT_S16, 

postMC 
), 

deltaData 
) 

20 ) 



) ; 



} 



// 

25 // Store into the target block 

// 

// (STORE targetBlock) <--.. . 
// 

bvp->AddInst ruction 
30 ( 

new BVPDataStore 
( 

targetBlock, 
postCorrect 
35 .) 

) ; 

} 

else 
{ 

40 // 

// No error correction data, so store motion result into 

target block 

// 

// (STORE targetBlock) <--.. . 
45 // 

bvp->Add Inst ruction 
( 

new BVPDataStore 
{ 

50 targetBlock, 

postMC 
) 

) ; 

} 



BVPXMMXCodeConverter conv; 



// 

// Convert graph into machine language 
60 // 
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5 compensationBlock [yuv] [delta] [half y] [half x] = code = 

conv. Convert (bvp) ; 

// 

// Get function entry pointer 
10 // 

compensation [yuv] [delta] [halfy] [halfx] = 
(CompensationCodeType) (code->GetCodeAddress ( ) ) ; 

// 

15 // delete graph 

// 

delete bvp; 
} 

} 

20 } 

} 

// 

// build motion compensation routines for bidirectional prediction 
25 // 

for (yuv =0; yuv<2; yuv++) 
{ 

for(delta=0; delta<2; delta++) 
{ 

30 for (half ly=0; halfly<2; halfly++) 

{ 

for (half lx=0; halflx<2; halflx++) 
{ 

for (half 2y=0; half2y<2; half2y++) 
35 { 

for (half 2x=0; half2x<2; half2x++) 
{ 





bvp 


= new BVPBlockProcessor ( ) ; 




40 


bvp- 


->AddArgument (height 


= new 




BVPArgument (false) ) ; 








bvp- 


->AddArgument (deltaStride 


= new 




BVPArgument (false) ) ; 








bvp- 


->AddArgument (deltaBase 


= new 


45 


BVPArgument (true) ) ; 








bvp- 


->AddArgument (targetBase 


= new 




BVPArgument (true) ) ; 








bvp- 


->AddArgument ( sourceStride 


= new 




BVPArgument (false) ) ; 






50 


bvp- 


->AddArgument (source2Base 


= new 




BVPArgument (true) ) ; 








bvp- 


->AddArgument (sourcelBase 


= new 




BVPArgument (true) ) ; 






55 


bvp- 


■>Set Dimension ( 1 , height); 





// 

// We now have two source blocks, so we need eight 
blocks for the half pel 
60 // prediction 

// 
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10 



15 



20 



25 



BVPSourceBlock 
sourceStride , 

BVPSourceBlock 
sourceStride, 

BVPSourceBlock 
sourceStride, 

BVPSourceBlock 
sourceStride, 

BVPSourceBlock 
sourceStride , 

BVPSourceBlock 
sourceStride , 

BVPSourceBlock 
sourceStride , 

BVPSourceBlock 
sourceStride, 



bvp->AddSourceBlock ( 
(sourcelBase, 
BVPData Format (BVPDT_U8 
bvp->AddSourceBlock ( 
(BVPPointer {sourcelBase 
BVPData Format (BVPDT_U8 
bvp->AddSourceBlock ( 
(BVPPointer (sourcelBase 
BVPDataFormat (BVPDT__U8 
bvp->AddSourceBlock ( 
(BVPPointer (sourcelBase 
BVPDataFormat (BVPDT_U8 
bvp->AddSourceBlock ( 
(source2Base, 
BVPDataFormat (BVPDT_U8 
bvp->AddSourceBlock ( 
(BVPPointer (source2Base 
BVPDataFormat (BVPDT_U8 
bvp->AddSourceBlock { 
(BVPPointer (source2Base 
BVPDataFormat {BVPDT_U8 
bvp->AddSourceBlock ( 
(BVPPointer (source2Base 
BVPDataFormat (BVPDT U8 



sourcelBlockA = new 

, 16), 0x10000)); 

sourcelBlockB = new 

, 1 + yuv) , 

, 16), 0x10000)); 

sourcelBlockC = new 

, sourceStride, 1, 0), 

, 16), 0x10000)); 

sourcelBlockD = new 

, sourceStride, 1, 1 + yuv), 

, 16), 0x10000)); 

source2BlockA = new 

, 16), 0x10000)); 

source2BlockB = new 

, 1 + yuv) , 

, 16), 0x10000)); 

source2BlockC = new 

, sourceStride, 1, 0), 

, 16), 0x10000)); 

source2BlockD = new 

, sourceStride, 1, 1 + yuv), 

, 16), 0x10000)); 



30 if (delta) 

bvp->AddSourceBlock (deltaBlock = new 
* BVPSourceBlock (deltaBase, deltaStride, BVPDataFormat (BVPDT_S16, 16) , 
0x10000) ) ; 



35 bvp->AddTargetBlock ( targetBlock = new 

BVPTargetBlock (targetBase, sourceStride, BVPDataFormat (BVPDT__U8, 
0x10000) ) ; 



16) , 



40 



45 



50 



55 



60 



// 

// Build bidirectional prediction from two 
unidirectional predictions 
// 

// . --BuildBlockMerge ( source 1 Block* ) 

// . / 
// <--{AVG) 
// \ 

// N --BuildBlockMerge ( source2Block* ) 

// 

bvp->Add Inst ruction 
( 

postMC = 

new BVPDataOperation 
' ( 

BVPDO_AVG, 

BuildBlockMerge (sourcelBlockA, sourcelBlockB, 
sourcelBlockC, sourcelBlockD, half lx, half ly) , 

BuildBlockMerge (source2BlockA, source2BlockB, 
source2BlockC, source2BlockD, half2x, half2y) 



) 



) ; 
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5 // 

// Apply error correction, see unidirectional case 
// 

if (delta) 
{ 

10 deltaData = new BVPDataLoad (deltaBlock) ; 

if (yuv) 
{ 

bvp->AddInst ruction 
15 ( 

postCorrect = 
new BVPDataConvert 
( 

BVPDTJJ8, 

20 new BVPDataOperat ion 

( 

BVPDO_ADD, 
new BVPDataConvert 
( 

25 BVPDT_S16, 

postMC 

), 

new BVPDataMerge 
{ 

30 BVPDM_ODDEVEN, 

new BVPDataSplit 
( 

BVPDS_HEAD, 
deltaData 

35 ) , 

new BVPDataSplit 
{ 

BVPDSJTAIL, 
deltaData 



40 



) 



) 

)■; 

45 } 

else 
{ 

bvp->AddInst ruction 
( 

50 postCorrect = 

new BVPDataConvert 
( 

BVPDT_U8, 

new BVPDataOperation 
55 ( 

BVPDO_ADD, 
new BVPDataConvert 
( 

BVPDT_S16, 

60 postMC 

), 
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10 



40 



deltaData 
) 

) 

) ; 

} 



bvp->AddInst ruction 
( 

new BVPDataStore 
( 

15 targetBlock, 

postCorrect 
) 

) ; 

} 

20 else 

{ 

bvp->AddInst ruction 
( 

new BVPDataStore 
25 ( 

targetBlock, 

postMC 

) 

) ; 

30 } 

BVPXMMXCodeConverter conv; 
// 

35 // Translate routines 



// 

bicompensationBlock [yuv] [delta] [halfly] [halflx] [half2y] [half2x] 
code = conv . Convert (bvp) ; 



bicompensation [yuv] [delta] [halfly] [halflx] [half2y] [half2x] = 
(BiCompensationCodeType) (code->GetCodeAddress ( ) ) ; 

45 delete bvp; 

} 

} 

} 

} 

50 } 
} 

} 

MPEG2MotionCompensation : : ~MPEG2MotionCompensation (void) 
55 { 

int yuv, delta, halfy, halfx, halfly, halflx, half2y, half2x; 
// 

// free all motion compensation routines 
60 // 

for (yuv = 0; yuv<2; yuv++) 
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5 { 

for (delta=0; delta<2; delta++) 
{ 

for (halfy=0; halfy<2; halfy++) 
{ 

10 for (halfx=0; halfx<2; halfx++) 

{ 

delete compensationBlock [yuv] [delta] [halfy] [halfx] ; 
} 

} 

15 } 
} 

for (yuv = 0; yuv<2; yuv++) 
{ 

for(delta=0; delta<2; delta++) 
20 { 

for (half ly=0; halfly<2; halfly++) 
{ 

for (halflx=0; halflx<2; halflx++) 
{ 

25 for (half 2y=0; half2y<2; half2y++) 

{ 

for (half2x=0; half2x<2; half2x++) 
{ 

delete 

30 bicompensationBlockfyuv] [delta] [halfly] [halflx] [half2y] [half2x] ; 

} 

} 

} 

} 

35 } 
} 

} 



40 TABLE C 



iifndef BVPGENERIC_H 
idefine BVPGENERIC_H 

45 # include "BVPList . h" 

// 

// Argument descriptor. An argument can be either a pointer or an 
integer used 

50 // as a stride, offset or width/height value. 

// 

class BVPArgument- 
{ 

public: 
55 bool pointer; 

int index; 
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BVPArgument (bool pointer__) 

: pointer (pointer_) , index (0) {} 



10 // 

// Description of an integer value used as a stride or offset. An 
integer value 

// can be either an argument or a constant 
// 

15 class BVPInteger 
{ 

public : 

int value; 
BVPArgument * arg; 

20 

BVPInteger (void) * 

: value (0), arg (NULL) {} 
BVPInteger (int value_) 

: value (value_) , arg (NULL) {} 
25 BVPInteger (unsigned value_) 

: value ( (int) value_) , arg (NULL) {}■ 
BVPInteger (BVPArgument * arg_) 

: value (0), arg(arg_) {} 

30 bool operator== (BVPInteger i2) 

{ 

return arg ? (i2.arg == arg) : (i2. value == value); 
} 

}; 

35 

// 

// Description of a memory pointer used as a base for source and 
target blocks. 

// A pointer can be a combination of an pointer base, a constant 
40 offset and 

// a variable index with scaling 
// 

class BVPPointer 
{ 

45 public: 

BVPArgument * base; 
BVPArgument * index; 
int offset; 
int scale; 



50 



BVPPointer (BVPArgument * base_) 

: base (base ), index (NULL) , offset (0) , scale (0) {} 



BVPPointer (BVPPointer base_, int offset_) 
55 : base (base_. base) , index(NULL), of f set (of f set_) , scale (0) {} 

BVPPointer (BVPPointer base_, BVPInteger index_, int scale_, int 
offset^) 

: base (base_.base) , index ( index_. arg) , offset (of f set_) , 
60 scale (scale_) {} . 
}; 
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// 

// Base data formats for scalar types 
// 

enum BVPBaseDataFormat 
10 { 

BVPDT_U8, // Unsigned 8 bits 

BVPDTJJ16,- // Unsigned 16 bits 

BVPDT_U32, // Unsigned 32 bits 

BVPDT_S8, // Signed 8 bits 
15 BVPDT_S16, // Signed 16 bits 

BVPDT_S32 // Signed 32 bits 

}; 

// 

20 // Data forma descriptor for scalar and vector (multimedia simd) 

types 

// Each data type is a combination of a base type and a vector size, 

// Scalar types are represented by a vector size of one. 

// 

25 class BVPDataFormat 
{ 

public: 

BVPBaseDataFormat format; 
int num; 



30 



45 



50 



BVPDataFormat (BVPBaseDataFormat _format, int _num = 1) 
: f ormat {. format ) , num { num) {} 



BVPDataFormat (void) 
35 : format (BVPDT_U8) , num(0) {} . 

BVPDataFormat (BVPDataFormat & f ) ' 
: f ormat ( f . format ) , num (f. num) {} 

40 BVPDataFormat operator* (int times) 

{return BVPDataFormat ( format , num * times);} 

BVPDataFormat operator/ (int times) 

{return BVPDataFormat ( format , num / times);} 



int BitsPerElement (void) {static const int sz[] = {8, 16, 32, 8, 

16," 32}; return sz[ format];} 

int BitsPerChunk (void) (return BitsPerElement ( ) * num;} 

}; 



// 

// Operation codes for binary data operations that have the 
// same operand type for both sources and the destination 
// 

55 enum BVPDataOperationCode 
{ 

BVPDO_ADD, // add with wraparound 

BVPDO_ADD_SATURATED, // add with saturation 

BVPDO_SUB, // subtract with wraparound 

60 BVPDO_SUB_SATURATED, // subtract with saturation 

BVPDO MAX, // maximum 
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10 



15 



20 



BVPDO_MIN, 

BVPDO_AVG, 

BVPDO_EQU, 

BVPDO_OR, 

BVPDO_XOR, 

BVPDO_AND, 

B V P DO_AN DNOT , 

BVPDO_MULL, 

BVPDO_MULH 

}; 



// minimum 

// average (includes rounding towards nearest) 

// equal 
// binary or. 

// binary exclusive or 

// binary and 

// binary and not 

// multiply keep lower half 

// multiply keep upper half 



// 

// Operations that extract a part of a data element 
// 



enum BVPDataSplitCode 
{ 

BVPDS_HEAD, 
BVPDSJTAIL, 
BVPDS_ODD, 
BVPDS EVEN 



// extract first half 

// extract second half 

// extract odd elements 

// extract even elements 
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30 



35 



40 



45 



// 

// Operations that combine to data elements 
// 

enum BVPDataMergeCode 
{ 

BVPDMJJPPERLOWER, // chain first and second operands 

BVPDM_ODDEVEN // interleave first and second operands 

}; 

// 

// Node types in the data flow graph 
// 

enum BVPInstruct ionType 



{ 

BVPIT_LOAD, 

BVPIT_STORE, 

BVPIT_CONSTANT, 

BVPIT_SPLIT, 

BVPIT_MERGE, 

B.VPIT_CONVERT, 

BVPIT_OPERATION 

}; 



// load an element from a source block 

// store an element into a source block 

// load a constant value 

// split an element 

// merge two elements 

// perform a data conversion 

// simple binary data operation 



50 // 

// Descriptor of a data block. Contains a base pointer, a 
stride (pitch) , a 

// format and an incrementor in vertical direction. The vertical 
• block position 

55 // can be incremented by a fraction or a multiple of the given pitch. 

// 

class BVPBlock 
{ 

public : 

60 BVPPointer base; 

BVPInteger pitch; 
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5 BVPDat.aFormat format; 

int yscale; 
int index; 

BVPBlock (BVPPointer _base, BVPInteger _pitch, BVPDataFormat 
10 _format, int _yscale) 

: base(_base), pitch (_pitch) , f ormat (_f ormat ) , yscale (_yscale) 

{} 

}; 

15 // 

// Descriptor of a source block 
// 

class BVPSourceBlock : public BVPBlock 
{ 

20 public: 

BVPSourceBlock (BVPPointer base, BVPInteger pitch, BVPDataFormat 
format, int yscale) 

: BVPBlock (base, pitch, format, yscale) {} 

}; 

25 

// 

// Descriptor of a target block 
// 

class BVPTargetBlock : public BVPBlock 
30 { 

public : 

BVPTargetBlock (BVPPointer base,. BVPInteger pitch, BVPDataFormat 
format, int yscale) 

: BVPBlock (base, pitch, format, yscale) {} 

35 } ; 

class BVPDataSource; 
class BVPDataDrain; 
class BVPDatalnstruction; 

40 

// ■ 

// Source connection element of a node in the data flow graph. Each 
node in 

// the graph contains one or none source connection. A source 
45 connection is 

// the output of a node in the graph. Each source connection can be 
connected 

// to any number of drain connections in other nodes of the flow 
graph. The 

50 // source is the output side of a node. 

// 

class BVPDataSource 
{ 

public : 

55 BVPDataFormat format; 

BVPList<BVPDataDrain *> drain; 

BVPDataSource (BVPDataFormat _format) : f ormat (_f ormat ) {} 

60 virtual void Addlnstructions (BVPList<BVPDataInstruction *> & 

instructions) {} 
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5 virtual BVPDatalnstruction * Tolnstruction ( void) {return NULL; } 

}; 

// 

// Drain connection element of a node in the data flow graph. Each 
10 node 

// can have none, one or two drain connections {but only one drain 
object 

// to represent both) . Each drain connects to exactly one source on 
the 

15 // target side. As eachnode can have only two inputs, each drain is 

connected 

// (through the node) with two sources. The drain is the input side 
of a 

// node. 
20 // 

class BVPDataDrain 

'{ 

public : 

BVPDataSource * sourcel; 

25 BVPDataSource * source2; 

BVPDataDrain (BVPDataSource * sourcel_, BVPDataSource * source2_ = 

NULL) 

: sourcel ( sourcel ), source2 (source2 ) {} 



30 



virtual BVPDatalnstruction * Tolnstruction (void) {return NULL; } 



// 

35 // Each node in the graph represents one abstract instruction. It 

has an 

// instruction type. that describes the operation of the node. 
// 

class BVPDatalnstruction 
40 { 

public : 

BVPInstructionType type; 
int index; 

45 BVPDatalnstruction (BVPInstructionType type__) 

: type(type_), index (-1) {} 

virtual -BVPDatalnst ruction (void) {} 

50 virtual void Addlnstructions ( BVPList<BVPDataInstruction *> & 

instructions) ; 

virtual void GetOperationBits ( int & minBits, int & maxBits); 

virtual BVPDataFormat Get Input Format (void) = 0; 
55 . virtual BVPDataFormat GetOutput Format (void) = 0; 

virtual BVPDataSource * ToSource (void) {return NULL; } 
virtual BVPDataDrain * ToDrain ( void) {return NULL; } 

}; 



60 



// 
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5 // Node that is a data source 

// 

class BVPDataSourcelnstruction : public BVPDatalnstruction, public 
BVPDataSource 
{ 

10 public: 

BVPDataSourcelnstruction (BVPInstructionType type_, BVPDataFormat 
f ormat_) 

: BVPDatalnstruction (type_) , BVPDataSource (format_) { } 

15 void GetOperationBits (int & minBits, int & maxBits) ; 

BVPDataFormat GetOutputFormat (void) {return format;} 
BVPDataFormat Get Input Format (void) {return format;} 

20 BVPDatalnstruction * Tolnstruction ( void) {return this;} 

BVPDataSource * ToSource (void) {return this;} 

}; 

// 

25 // Node that is a data source and has one or two sources connected to 

its drain 
// 

class BVPDataSourceDrainlnstruction : public BVPDataSourcelnstruction, 
public BVPDataDrain 
30 { 

public: 

BVPDataSourceDrainlnstruction (BVPInstructionType type_, 
BVPDataFormat format_, BVPDataSource * sourcel_) 

: BVPDataSourcelnstruction (type__, f ormat_) , 
35 BVPDataDrain (sourcel_) 

{ sourcel->drain . Insert (this) ; } 
BVPDataSourceDrainlnstruction (BVPInstructionType type_, 
BVPDataFormat format_, BVPDataSource * sourcel__, BVPDataSource * 
source2_) 

40 : BVPDataSourcelnstruction (type_, format_) , 

BVPDataDrain (sourcel_, source2_) 

{ sourcel->drain . Insert (this ) ; source2->drain .Insert (this ) ; } 

}; 

45 // 

// Instruction to load data from a source block 
// 

class BVPDataLoad : public BVPDataSourcelnstruction 
{ 

50 public: 

BVPSourceBlock * block; 
int offset; 

BVPDataLoad (BVPSourceBlock * block_, int offset_ - 0) 
55 : BVPDataSourcelnstruction (BVPIT_LOAD, block_->f ormat ) , 

block (block_) , of f set (of f set__) {} 

void Addlnstructions (BVPList<BVPDataInstruction *> & instructions); 

}; 
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// 
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5 // Instruction to store data into a target block 

■ // 

class BVPDataStore : public BVPDatalnstruction, public BVPDataDrain 
{ 

public : 

10 BVPTargetBlock * block; 

BVPDataStore (BVPTargetBlock * block_, BVPDataSource * source) 
: BVPDatalnstruction (BVPIT_STORE) , BVPDataDrain (source) , 
block (block_) 
15 { source ->dr a in . Insert (this) ; } 

void Addlnstructions (BVPList<BVPDataInstruction *> & instructions) 

BVP Data Format GetOutput Format (void) {return sourcel->f ormat ; } 
20 BVPDataFormat Get Input Format (void) {return sourcel->f ormat ; } 

BVPDatalnstruction * Tolnstruct ion (void) {return this;} 
BVPDataDrain * ToDrain ( void) {return this;} 
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}; 



// 

// instruction to load a constant 
// 

class BVPDataConstant : public BVPDataSourcelnstruction 
30 { 

public : 

int value; 

BVPDataConstant (BVPDataFormat format, int value_) 
35 : BVPDataSourcelnstruction (BVPIT_CONSTANT, format), 

value (value_) { } 

}; 

// 

40 // Instruction to split a data element 

// 

class BVPDataSplit : public BVPDataSourceDrainlnstruction 

■ { 

public : 

45 BVPDataSplitCode code; 

BVPDataSplit (BVPDataSplitCode code_, BVPDataSource * source) 

: BVPDataSourceDrainlnstruction { BVPIT_SPLIT, source->f ormat / 2, 
source), code(code_) {} 

50 

void Addlnstructions (5VPList<BVPDataInstruction *> & instructions) 

BVPDataDrain * ToDrain (void) {return this;} 

55 BVPDataFormat Get Input Format (void) {return sourcel->f ormat ; } 

}; 

// 

// Instruction to merge two data elements 
60 // 

class BVPDataMerge : public BVPDataSourceDrainlnstruction 
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5 { 

public : 

BVPDataMergeCode code; 

BVPDataMerge (BVPDataMergeCode code_, BVPDataSource * sourcel_, 
10 BVPDataSource * source2_) 

: BVPDataSourceDrain Inst ruction (BVPIT_MERGE, sourcel_->f ormat * 
2, sourcel_, source2_) , 
code (code_) { } 

15 void Addlnstruct ions (BVPList<BVPDataInstruction *> & instructions) 

BVPDataDrain * ToDrain { void) {return this;} 

BVP Data Format Get Input Format (void) {return sourcel->f ormat ; } 

20 }; 

// 

// Instruction to convert the basic vector elements of an data 
element into 

25 // a different format (eg. from signed 16 bit to unsigned 8 bits). 

// 

class BVPDataConvert : public BVPDataSourceDrainlnstruction 
{ 

public: 

30 BVPDataConvert (BVPBaseDataFormat target, BVPDataSource * source) 

: BVPDataSourceDrainlnstruction (BVPIT_CONVERT, 
BVPDataFormat (target, source->f ormat . num) , source) {} 

void Addlnstructions (BVPList<BVPDataInstruction *> & instructions) 

35 

BVPDataDrain * ToDrain (void) {return this;} 

BVPDataFormat Get InputFormat (void) {return sourcel->f ormat ; } 

}; 

40 

// 

// Basic data manipulation operation from two sources to one drain. 
// 

class BVPDataOperation : public BVPDataSourceDrainlnstruction 
45 { 

public: 

BVPDataOperationCode code; 

BVPDataOperation (BVPDataOperationCode code_, BVPDataSource * 
50 sourcel_, BVPDataSource * source2_) 

: BVPDataSourceDrainlnstruction (BVPIT_OPERATION, sourcel_- 
>format, sourcel_, source2_) , code(code__) {} 

void Addlnstructions { BVPList<BVPDataInstruction *> & instructions) 

55 

BVPDataDrain * ToDrain (void) {return this;} 

}; 
// 

60 // Descriptor for one image block processing routine. It contains 

the arguments, the 
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5 // size and the dataflow graph. On destruction of the block 

processor all argument, 

// blocks and instructions are also deleted. 
// 

class BVPBlockProcessor 
10 { 

public: 

BVPInteger width; 
BVPInteger height; 

15 BVPList<BVPBlock *> blocks; 

BVPList<BVPDataInstruction *> instructions; 
BVPList<BVPArgument *> args; 

BVPBlockProcessor (void) 
20 { 
} 

-BVPBlockProcessor (void) ; 
25 // 

// Add an argument to the list of arguments. Please note that 
the arguments 

// are added in the reverse order of the c-calling convention. 

// 

30 void AddArgument (BVPArgument * arg) 

{ 

arg->index = args.Numf); 

args . Insert (arg) ; 

} 

35 

// 

// Set the dimension of the operation rectangle. The width and 
height can 

// either be constants or arguments to the routine. 
40 // 

void Set Dimension (BVPInteger width, BVPInteger height) 
{ 

this->width = width; 
this->height = height; 
45 } 

// 

// Add a source block to the processing 
// 

50 void AddSourceBlock(BVPSourceBlock * block) 

{ 

block->index = blocks . Num () ; 

blocks . Insert (block) ; 

} 

55 

// 

// Add a target block to the processing 
// 

void AddTargetBlock (BVPTargetBlock * block) 
60 { 

block->index = blocks . Num () ; 
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5 blocks . Insert (block) ; 

} 

// 

// Add an instruction to the dataflow graph. All referenced 
10 instructions 

// will also be added to the graph if they are not yet part of 

it. 

// 

void Addlnstruction (BVPDatalnstruction * ins) 
15 { 

ins->AddInstructions (instructions) ; . 
} 

void GetOperationBits ( int & minBits, int & maxBits) ; 

20 } ; 



#endif 



25 

Although the invention is described herein with reference to the preferred 
embodiment, one skilled in the art will readily appreciate that other applications 
may be substituted for those set forth herein without departing from the spirit and 
scope of the present invention. Accordingly, the invention should only be limited 
30 by the claims included below. 
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